16 research outputs found

    Studying the Impact of Multicore Processor Scaling on Cache Coherence Directories via Reuse Distance Analysis

    Get PDF
    Directories are one key part of a processor's cache coherence hardware, and constitute one of the main bottlenecks in multicore processor scaling, e.g. core count and cache size scaling. Many research effects have tried to improve the scalability of the directory, but most of them only simulate a few architecture configurations. It is important to study the directory's architecture dependency, as the CPUs continue to scale. This is because besides applications, directory behaviors are also highly sensitive to architecture. Varying core count directly affects the amount of sharing in the directory, and varying the data cache hierarchy affects the directory access stream. But unfortunately, exploring the huge design space of multiple core counts and cache configurations is challenging using traditional architectural simulation due to the slow speed of simulations. This thesis studies the directory using multicore reuse distance analysis. It extends existing multicore reuse distance techniques, developing a method to extract directory access information from the parallel LRU stacks used to acquire private-stack reuse distance profiles. This thesis implements this method in a PIN-based profiler to study the directory behavior, including the directory access pattern and directory content, and to analyze current directory techniques. The profile results show that the directory accesses are highly dependent on cache size, exhibiting a 3.5x drop when scaling the data cache size from 16KB to 1MB; the sharing causes the ratio of directory entry to cache blocks to drop below 50%; and the majority of the accesses are to a small percentage of the directory entries. Cache simulations are performed to validate the profiling results, showing the profiled results are within 14.5% of simulation on average. This thesis also analyzes different directory techniques using the insights from the profiler. The case studies on the Cuckoo, DGD, SCD techniques and multi-level directories show that required directory size varies significantly with CPU scaling, the opportunity of compressing private data decreases with cache scaling, reducing the sharer list size is an effective technique and a small L1 directory is sufficient to capture most of the latency critical accesses respectively

    Studying the impact of multicore processor scaling on directory techniques via reuse distance analysis

    Full text link
    Abstract—Researchers have proposed numerous directory techniques to address multicore scalability whose behavior de-pends on the CPU’s particular configuration, e.g. core count and cache size. As CPUs continue to scale, it is essential to explore the directory’s architecture dependences. However, this is challenging using detailed simulation given the large number of CPU configurations that are possible. This paper proposes to use multicore reuse distance analysis to study coherence directories. We develop a framework to extract the directory access stream from parallel LRU stacks, enabling rapid analysis of the directory’s accesses and contents across both core count and cache size scaling. We also implement our framework in a profiler, and apply it to gain insights into multicore scaling’s impact on the directory. Our profiling results show that directory accesses reduce by 3.5x across data cache size scaling, suggesting techniques that tradeoff access latency for reduced capacity or conflicts become increasingly effective as cache size scales. We also show the portion of on-chip memory devoted to the directory cache can be reduced by 53.3 % across data cache size scaling, thus lowering the over-provisioning needed at large cache sizes. Finally, we validate our RD-based directory analyses, and find they are within 13% of cache simulations in terms of access count, on average. I

    Studying Directory Access Patterns via Reuse Distance Analysis and Evaluating Their Impact on Multi-Level Directory Caches

    Get PDF
    The trend for multicore CPUs is towards increasing core count. One of the key limiters to scaling will be the on-chip directory cache. Our work investigates moving portions of the directory away from the cores, perhaps to off-chip DRAM, where ample capacity exists. While such multi-level directory caches exhibit increased latency, several aspects of directory accesses will shield CPU performance from the slower directory, including low access frequency and latency hiding underneath data accesses to main memory. While multi-level directory caches have been studied previously, no work has of yet comprehensively quantified the directory access patterns themselves, making it difficult to understand multi-level behavior in depth. This paper presents a framework based on multicore reuse distance for studying directory cache access patterns. Using our analysis framework, we show between 69-93% of directory entries are looked up only once or twice during their liftimes in the directory cache, and between 51-71% of dynamic directory accesses are latency tolerant. Using cache simulations, we show a very small L1 directory cache can service 80% of latency critical directory lookups. Although a significant number of directory lookups and eviction notifications must access the slower L2 directory cache, virtually all of these are latency tolerant

    Operando Synthesis of High-Curvature Copper Thin Films for CO<sub>2</sub> Electroreduction

    No full text
    As the sole metal that could reduce CO2 to substantial amounts of hydrocarbons, Cu plays an important role in electrochemical CO2 reduction, despite its low energy efficiency. Surface morphology modification is an effective method to improve its reaction activity and selectivity. Different from the pretreated modification method, in which the catalysts self-reconstruction process was ignored, we present operando synthesis by simultaneous electro-dissolution and electro-redeposition of copper during the CO2 electroreduction process. Through controlling the cathodic potential and CO2 flow rate, various high-curvature morphologies including microclusters, microspheres, nanoneedles, and nanowhiskers have been obtained, for which the real-time activity and product distribution is analyzed. The best CO2 electro-reduction activity and favored C2H4 generation activity, with around 10% faradic efficiency, can be realized through extensively distributed copper nanowhiskers synthesized under 40 mL/min flow rate and &#8722;2.1 V potential

    Studying Multicore Processor Scaling via Reuse Distance Analysis

    No full text
    The trend for multicore processors is towards increasing numbers of cores, with 100s of cores–i.e. large-scale chip multiprocessors (LCMPs)–possible in the future. The key to realizing the potential of LCMPs is the cache hierarchy, so studying how memory performance will scale is crucial. Reuse distance (RD) analysis can help architects do this. In particular, recent work has developed concurrent reuse distance (CRD) and private reuse distance (PRD) profiles to enable analysis of shared and private caches. Also, techniques have been developed to predict profiles across problem size and core count, enabling the analysis of configurations that are too large to simulate. This paper applies RD analysis to study the scalability of multicore cache hierarchies. We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling. We find interference-based locality degradation is more significant than sharing-based locality degradation. For 256 cores running small problems, the former occurs at small cache sizes, allowing moderate capacity scaling of multicore caches to achieve the same cache performance (MPKI) as a single-core cache. At very large problems, interference-based locality degradation increases significantly in many of our benchmarks. For shared caches, this prevents most of our benchmarks from achieving constant-MPKI scaling within a 256 MB capacity budget; for private caches, all benchmarks cannot achieve constant-MPKI scaling within 256 MB

    Studying multicore processor scaling via reuse distance analysis

    No full text

    Neuroprotective Effects Of Fingolimod In Mouse Models Of Parkinson\u27S Disease

    No full text
    Parkinson\u27s disease (PD) is characterized by a progressive loss of dopaminergic neurons with limited treatment options. Emerging evidence shows that FTY720 protects against neural injury via modulation of the sphingosine-1-phosphate 1 receptor (S1PR1). However, it remains unclear whether FTY720 could influence neurodegeneration in PD.Therefore, the present studywas designed to determine the impact of fingolimod (FTY720), a sphingosine-1-phosphate receptor (S1PR) agonist, on 2 mouse models of PD. We found that FTY720 significantly reduced the deficit of motor function, diminished the loss of tyrosine hydroxylase-positive neurons in the substantia nigra, and attenuated the decrease of striatal dopamine and metabolite levels in mice receiving 6-hydroxydopamine (6-OHDA) or rotenone to simulate PD. An S1PR1-selective antagonist, W146, blocked the neuroprotective effects of FTY720.Of note, FTY720 retained the phosphorylation of ERK, together with a decreased expressionof cleaved caspase-3 inmice treatedwith6-OHDAor rotenone. In vitro studies revealedthat FTY720 also attenuated 6-OHDA- or rotenone-induced toxicity in SH-SY5Y cells. These findings suggest the potential of S1PR modulation as a treatment for PD.-Zhao, P., Yang, X., Yang, L., Li, M., Wood, K., Liu, Q., Zhu, X. Neuroprotective effects of fingolimod in mouse models of Parkinson\u27s disease. FASEB J. 31, 172-179 (2017). www.fasebj.org

    Optimization of PbI<sub>2</sub>/MAPbI<sub>3</sub> Perovskite Composites by Scanning Electrochemical Microscopy

    No full text
    A variety of PbI<sub>2</sub>/MAPbI<sub>3</sub> perovskites were prepared and investigated by a rapid screening technique utilizing a modified scanning electrochemical microscope (SECM) in order to determine how excess PbI<sub>2</sub> affects its photoelectrochemical (PEC) properties. An optimum ratio of 2.5% PbI<sub>2</sub>/MAPbI<sub>3</sub> was found to enhance photocurrent over pristine MAPbI<sub>3</sub> on a spot array electrode under irradiation. With bulk films of various PbI<sub>2</sub>/MAPbI<sub>3</sub> composites prepared by a spin-coating technique of mixed precursors and a one-step annealing process, the 2.5% PbI<sub>2</sub>/MAPbI<sub>3</sub> produced an increased photocurrent density compared to pristine MAPbI<sub>3</sub> for 2 mM benzoquinone (BQ) reduction at −0.4 V vs Fc/Fc<sup>+</sup>. As a result of the relatively high quantum yield of MAPbI<sub>3</sub>, a time-resolved photoluminescence quenching experiment could be applied to determine electron–hole diffusion coefficients and diffusion lengths of PbI<sub>2</sub>/MAPbI<sub>3</sub> composites, respectively. The diffusion coefficients combined with the exciton lifetime of the pristine 2.5% PbI<sub>2</sub>/MAPbI<sub>3</sub> (τ<sub>PL</sub> = 103.3 ns) give the electron and hole exciton diffusion lengths, ∼300 nm. Thus, the 2.5% PbI<sub>2</sub>/MAPbI<sub>3</sub> led to an approximately 3.0-fold increase in the diffusion length compared to a previous report of ∼100 nm for the pristine MAPbI<sub>3</sub> perovskite. We then demonstrated that the efficiency of liquid-junction solar cells for 2.5% excess PbI<sub>2</sub> of p-MAPbI<sub>3</sub> was improved from 6.0% to 7.3%
    corecore